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Recent clinical trials using antibodies with low toxicity 
and high efficiency have raised expectations for the devel- 
opment of next-generation protein therapeutics. However, 
the process of obtaining therapeutic antibodies remains 
time consuming and empirical. This review summarizes 
recent progresses in the field of computer-aided antibody 
development mainly focusing on antibody modeling, 
which is divided essentially into two parts: (i) modeling 
the antigen-binding site, also called the complementarity 
determining regions (CDRs), and (ii) predicting the rela- 
tive orientations of the variable heavy (Vu) and light (Vl) 
chains. Among the six CDR loops, the greatest challenge 
is predicting the conformation of CDR-H3, which is the 
most important in antigen recognition. Further computa- 
tional methods could be used in drug development based 
on crystal structures or homology models, including 
antibody -antigen dockings and energy calculations with 
approximate potential functions. These methods should 
guide experimental studies to improve the affinities and 
physicochemical properties of antibodies. Finally, several 
successful examples of in silico structure-based antibody 
designs are reviewed. We also briefly review structure- 
based antigen or immunogen design, with application to 
rational vaccine development. 

Keywords: antibody design/antibody engineering/protein 
therapeutics/vaccine design 



Introduction 

Computational methods are almost universally accepted as 
important tools for the invention of small molecule drugs, 
helping with tasks such as optimizing affinity for a target, 
minimizing off-target effects and optimizing pharmacoki- 
netic properties. In these contexts, the computational 
methods are generally not considered substitutes for empiric- 
al testing, but rather a way to generate testable hypotheses, 
helping to interpret and guide experiments. The situation 



with antibody therapeutics is strikingly different in this 
regard. While modeling has in fact contributed to the design 
of therapeutic antibodies, notably in early attempts at human- 
izing antibodies, overall the potential impact of computation- 
al methods is not as well defined, and the tools not as well 
developed and less broadly employed, than in small molecule 
drug discovery. Here we review progress in the development 
of computational methods that may ultimately be routinely 
used in antibody drug discovery. 

Because we encounter a large variety of foreign mole- 
cules in daily life, 'diversity' is a key concept in the adap- 
tive immune system in which antibodies take a major role. 
The sequences, structures and functions of antibodies have 
been extensively studied due to their growing importance as 
therapeutics (Carter, 2006; Reichert, 2008; Nelson and 
Reichert, 2009) and research tools (Nogi et al., 2008; Hattori 
et al., 2010). At the sequence level, antibody diversity is 
generated by (i) recombination of V(J)D gene segments 
(Tonegawa, 1983) and (ii) somatic mutations during anti- 
body maturation, which leads to higher affinity (Besmer 
et al., 2004; Neuberger, 2008). At a structural level, anti- 
body diversity is manifested primarily in the antigen-binding 
sites, comprised of six loops, and by the relative orientations 
between the light chain and heavy chain variable domains 
(Vl and Vh). 

Our growing understanding of sequence- structure rela- 
tionships in antibodies, and advances in computational 
protein modeling, has enabled progress toward computational 
methods that can assist in re-designing antibodies for higher 
affinity or other desired modifications (Rosenberg and 
Goldblum, 2006; Lippow and Tidor, 2007; Karanicolas and 
Kuhlman, 2009). In Fig. 1, we summarize several ways in 
which computational methods can be deployed in the context 
of antibody design. One central goal is to accurately predict 
the structures of antibodies from their sequences, a special 
case of the comparative protein modeling problem, which is 
valuable due to the challenges and expenses associated with 
experimental structure determination. In cases where the 
binding interaction between antibody and antigen is 
unknown, protein -protein docking methods can be used to 
predict the complex structure, although this remains challen- 
ging, especially when using homology-modeled structures 
for either the antibody or the antigen (Gray, 2006). Finally, 
using experimentally determined or predicted structures of 
the antibody -antigen complex, computational methods can 
be used to predict mutations that may improve binding affin- 
ity, specificity or other properties such as solubility. 

In addition to antibody designs, designing better antigens 
or immunogens is also expected to elicit neutralizing anti- 
bodies (NAbs) for viruses, such as HIV and influenza. 
Antigens could work as vaccines only if they elicit appropri- 
ate antibodies without showing viral activity. In traditional 
vaccine approaches, antigens are empirically identified and 
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are usually isolated, inactivated or attenuated, and then 
injected so that they cannot cause undesirable infections, but 
can induce desired antibody response (Rinaudo et al, 2009). 
Although the principle has not changed dramatically, the ac- 
cumulation of experimental data and complex structures of 
NAbs- antigens provides us with a better understanding of 
the structural basis of antigen recognition, and a structure- 
based rational vaccine design is now becoming possible. 

In this review, we discuss successes and challenges of ap- 
plying computational methods in each of these areas, focus- 
ing on areas that we view as particularly critical to achieving 
routine, successful use of computational methods in antibody 
design, especially complementarity determining region 
(CDR)-H3 modeling and predicting the effects of somatic or 
designed mutations during antibody optimization. Recent 
successes of structure-based antigen design based on protein 
scaffolds are also described. Other topics, such as canonical 
structures (Al-Lazikani et al, 1997; Morea et al., 2000; 
Vargas-Madrazo and Paz-Garcia, 2002; Chailyan et al, 
2011a), especially of CDR-L3 (Kuroda et al, 2009); the anti- 
body numbering schemes (Honegger and Pluckthun, 2001; 
Abhinandan and Martin, 2008); B-cell epitope predictions 
(Ponomarenko and Bourne, 2007; Evans, 2008); humaniza- 
tion (Almagro and Fransson, 2008); and details of protein- 
protein docking (Gray, 2006) are reviewed elsewhere. 



Antibody CDR modeling 

A core challenge for computational antibody engineering is 
predicting the structure of the antibody from sequence. This 
is of course a specific case of the more general protein struc- 
ture prediction problem (Baker and Sali, 2001; Zhang, 2008), 
but antibodies present both special advantages and special 
challenges relative to other proteins. Before discussing 
methods for predicting antibody structures, we first briefly 
review their structural organization. 



An antibody consists of two types of chains, the light 
and heavy chains, each of which is composed of multiple 
domains, all of which assume the common immunoglobulin 
(Ig)-fold (Murzin et al, 1995; Chothia et al, 1998). The 
antigen-binding site is located in the 'variable' domains of 
each chain, Vl and Vh- The antigen-binding sites are called 
CDRs and are composed primarily of six 'hypervariable' 
loops, three on the light chain (LI, L2 and L3) and three on 
the heavy chain (HI, H2 and H3). The remainder of the vari- 
able domains is structurally well conserved at the backbone 
level, and aligning antibody sequences is straightforward. For 
these reasons, a primary focus of antibody modeling is pre- 
dicting the conformations of the CDR loops from their 
sequences (Whitelegg and Rees, 2004). 

In the 1980s, antibody modeling started with the unexpect- 
ed discovery that most of the CDR loops (all but H3) adopt a 
limited number of conformations called canonical structures 
(Chothia and Lesk, 1987; Chothia et al, 1989; Tramontano 
et al, 1989, 1990; Tramontano and Lesk, 1992). The most 
probable canonical structures can be predicted based on the 
length of the loop and the identities of key residues within or 
outside the CDR regions. This knowledge has been used in 
humanization procedures (Riechmann et al, 1988) where key 
residues should be maintained when non-human CDRs are 
grafted onto human frameworks; otherwise, the conformation 
of the antigen-binding sites could change resulting in 
reduced antibody -antigen-binding affinity (Morea et al, 
2000; Almagro and Fransson, 2008). Canonical structures can 
be defined only when the same sequence motifs discovered 
from the Protein Data Bank (PDB) appear in the large se- 
quence database (Kuroda et al, 2009). Recent increases in 
the number of antibody crystal structures have made it pos- 
sible to define more canonical structures (Kuroda et al, 
2009; Teplyakov et al, 2010b; Chailyan et al, 2011a). 
Definitions of canonical structures are available in the litera- 
tures (Al-Lazikani et al, 1997; Morea et al, 2000). A new 
classification scheme for CDRs has also been proposed 
based on affinity propagation clustering with a large number 
of crystal structures (North et al, 201 1). 

Adopting these empirically determined sequence -structure 
relationships has made it possible to model the LI, L2, L3, 
HI and H2 loops with high accuracy [backbone root mean 
square deviation (RMSD) between native structures and tem- 
plates are usually < 1 .0 A] and subsequent work has focused 
mainly on predicting CDR-H3 structures (Shirai et al, 1996; 
Morea et al, 1997, 1998; Oliva et al, 1998). CDR-H3 loops 
are located in the center of the antigen-binding site, play an 
important role in antigen recognitions and gain more muta- 
tions during affinity maturation than other portions of anti- 
bodies (Clark et al, 2006b). They assume diverse 
conformations, limiting the ability to derive simple rules for 
their sequence- structure relationships (Fig. 2A). Structural 
changes in CDR-H3 also occur upon antigen binding 
(Fig. 2B), which makes the problem more complex. These 
challenges have led to many experimental and computational 
studies to elucidate sequence -structure -function relationships 
of CDR-H3 (ZemUn et al, 2003; Brooks et al, 2010; Julien 
et al, 2010; Ofek et al, 2010a,b). 

Although their overall conformations are diverse, the con- 
formations of the C-terminal base regions of CDR-H3 are 
limited and assume 'kinked' or 'extended' forms. Applying 
sequence -structure rules for this portion of the loop 
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Fig. 2. Structural diversity of antibodies. (A) Variability of six CDRs. (B) 
Structural changes of CDR-H3 [root mean square deviation (RMSD) of 
backbone N, Ca, C and O atoms is 2.3 A] and rearrangement of V^/Vh 
domain orientation. Anti-HIV-1 peptide antibody, FabSO. 1 (magenta (IGGI) 
and white (IGGC) for antigen bound and free structures, respectively). The 
white surface model is the antigen, HIV-1 V3 loop peptide. All graphics of 
protein structures in this article are generated using UCSF Chimera (http 
://www.cgl. ucsf.edu/chimera). 



simplifies the problem by reducing the computational search 
space for the loop modeling. Shirai et al. (1996, 1999) first 
proposed H3-rules, which identified the base form and 
(3-hairpin characteristics from the sequences. Recent 
advances in structural genomics have increased our knowl- 
edge of sequence -structure relationships in CDR-H3, 
leading to refinements of the H3-rules (Koliasnikov et al, 
2006; Kuroda et ai, 2008). 

In the absence of simple sequence -structure relationships 
for the remainder of CDR-H3, the conformations can be pre- 
dicted by exploiting the great many methods that have been 
developed for the more general problem of loop prediction. 
These methods generally work by first enumerating large 
numbers of plausible loop conformations, and then predicting 
which of these is likely to be correct by using conformational 
energy or a scoring function derived from the database of 
crystal structures (Soto et al, 2008). 

One approach to enumerating plausible loop conforma- 
tions for CDR-H3, and other loops for which no simple 
sequence-structure relationships exist, is to mine the known 
loop structures in the PDB (Jones and Thirup, 1986). For 
example. Levy and coworkers surveyed the coverage of short 
(7-, 9- and 15-residue) fragments in the PDB against frag- 
ments derived from a novel protein, and showed that most of 



them are covered with reasonable accuracy (RMSD < 2.0 A) 
(Du et al, 2003). Some other studies have also suggested 
that existing loop conformations in the PDB are sufficient to 
model most loops (Fernandez-Fuentes and Fiser, 2006; Choi 
and Deane, 2010). 

With respect to CDR-H3 specifically, Kuroda et al. devel- 
oped a knowledge-based sampling method using fragments 
derived from both CDR-H3 loops in antibodies and loops in 
non-homologous proteins (Kuroda et al., submitted for publi- 
cation). They have also demonstrated that the usefulness of 
the database approach is enhanced by energy-based refine- 
ment. Similarly, Gray and coworkers proposed antibody- 
modeling software, RosettaAntibody, based on their Rosetta 
design suite (Rohl et al., 2004; Sircar et ai, 2009). They 
modeled CDR-H3 loops using fragment assembly and the 
cyclic coordinate descent approach and minimized them 
using the Rosetta protocol (Sivasubramanian et al, 2009). 
They also showed that antibody models constructed by their 
protocols could be used in subsequent computational 
docking procedures using RosettaDock (Gray et al, 2003; 
Wang et al, 2007), although they noted a number of chal- 
lenges mainly associated with CDR-H3 modeling. 

A distinct approach to enumerating possible loop confor- 
mations is the so-called 'ab initio'' approach, which uses 
generic 'Ramachandran' preferences to generate possible 
backbone conformations, and not specific loop conformations 
from the PDB. In one of the earliest antibody modeling 
methods, Martin et al. combined database searching and ab 
initio loop prediction using the CONGEN program 
(Bmccoleri et al, 1988; Martin et al, 1989). A neural 
network machine learning method was also used to tackle 
the H3 modeling problem (Reczko et al, 1995). 

More recently, Jacobson and coworkers have developed an 
ab initio loop modeling protocol, which searches conform- 
ational space by backbone torsion-angle sampling with sub- 
sequent energy-based refinement and scoring based on the 
all-atom optimized potentials for liquid simulations force 
field and an implicit solvent model (Jacobson et al, 2004; 
Sellers et al, 2008). This method has been shown to perform 
well for loop modeling in generic proteins (Rossi et al, 
2007, 2009) and the protocol can be applied to CDR-H3 
modeling with accuracies high enough that the models could 
be used as surrogates for experimental structures in many 
applications (Sellers et al, 2010). 

In general, loop prediction becomes more difficult as the 
loops become longer, especially for those longer than ~12 
amino acids (Zhu et al, 2006), which includes a significant 
fraction of CDR-H3 loops, which have a median length of 
12 residues (Zemlin et al, 2003). This is largely because the 
sampling problem rapidly becomes more difficult with in- 
creasing loop length. However, it should also be kept in 
mind that long surface loops in proteins are often intrinsical- 
ly flexible, such that the observed conformation can vary de- 
pending on its interactions with antigen or due to 'artifacts' 
associated with crystal packing (Rapp and Pollack, 2005). 
These considerations should be born in mind when evaluat- 
ing or applying loop prediction methods to CDR-H3 . 

Theoretically, the most reliable approach should be to 
identify the loop conformations with the lowest free energy 
values at room temperature. For that purpose, Shirai et al 
(1998) and Kim et al (1999) performed the multicanonical 
molecular dynamics simulations (Nakajima et al, 1997; 
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Higo et al., 2012) to probe the free energy landscapes, and 
they obtained the stable structural ensembles of the CDR-H3 
loops at 300 K, which were found to include the correspond- 
ing crystal structures. In addition, characterizing the free 
energy landscapes should aid in predicting structural changes 
of CDR-H3 upon antigen binding based on the 'population 
shift' principle that suggests that bound conformations 
should be relatively close in free energy to the unbound state 
(Watanabe et al., 2006; Okazaki and Takada, 2008). Higo 
et al. (20 11) recently observed both the population shift and 
the induced fit mechanisms during the coupled folding and 
binding in an intrinsically disordered protein. However, 
although conformational sampling methods have matured, 
the underlying energy models — generally all-atom force 
fields — remain imperfect, limiting the accuracy of the results. 
We expect that, in the future, with the further refinement of 
force field parameters, methods based on generating struc- 
tural ensembles will be routinely used to accurately predict 
CDR-H3 loop conformations with and without antigens. 

Predicting VJV^ domain orientations 

Another challenge in antibody modeling is predicting the 
VJVn domain orientations. The orientations of these two 
domains vary among different antibodies, and can also 
change upon antigen binding as seen in early work by 
Stanfield et al. (1993; Fig. 2B). When modeling antibody 
structures from sequence, the relative orientations of the 
domains will affect the structure of the antigen-binding 
surface. Thus, understanding sequence -structure relation- 
ships as well as the energetics of Vi^/Vu domains interface is 
crucial not only to elucidate the structural diversity of anti- 
bodies, but also to accurately model antibody structures. 

A three-layered structure was proposed for the Vi^/Vh inter- 
face based on analyzing antibody crystal structures and 
sequences (Chothia et al., 1985; Vargas-Madrazo and 
Paz-Garcia, 2003). The key residues in these layers are also 
important for determining the conformations of CDR-H3 
(Kuroda et al., 2008). Therefore it is apparent that the accur- 
acy of domain orientation prediction can affect the accuracy 
of CDR-H3 modeling and vice versa, which makes sense 
because CDR-H3 is located close to the interface with the Vl 
domain. In one striking example, a crystal structure of a 
'chimeric' antibody, which had a different Vl domain from 
the original antibody, was solved, and a structural change of 
CDR-H3 was observed upon the Vl domain substitution 
(RMSD 2.88 A) whereas the canonical structures of the par- 
ental antibodies are maintained (Fig. 3A; Pei et al., 1997). In 
another example, however, the V^i domain of an antibody 
accommodated a different domain from another antibody 
without any significant structural changes of CDR-H3 
(RMSD 0.54 A) or the other portions of the Va domain 
(Fig. 3B; Teplyakov et al., 2010a). These studies, in which 
artificial Vh/Vh domain pairs form stable complexes, shed 
light on the possibility of multi-specific antibodies (Bostrom 
et al., 2009), which recognize two distinct antigens with non- 
overlapping functional paratopes. That is, variable domains 
of light and heavy chains, which recognize an antigen, 
respectively, independently of the other chain, can be com- 
bined, leading to a multi-specific antibody. Considering the 
large number of Vl and Vh sequences in nature, computer- 
guided approaches to explore the potential pairing of these 




Fig. 3. Effect of Vl domain substitution on CDR-H3 conformation. (A) 
Bl-8 (1A6V; magenta (Vh) and pink (Vl)) and B1-8/NQ11 (INQB; gray 
(Vh) and wliite (Vl))- The Vl domain of Bl-8 (pink) was replaced by that 
of another antibody, NQll, which results in a 'chimeric' antibody termed 
B1-8/NQ11. Structural change of CDR-H3 is observed (RMSD 2.9 A). (B) 
C836 (3L7E; magenta (Vh) and pink (Vl)) and X836 (3MBX; gray (Vh) and 
white (Vl)). The Vl domain of C836 (pink) was replaced by that of another 
antibody, which results in a 'chimeric" antibody termed X836. Structural 
change of CDR-H3 is not observed (RMSD 0.5 A). 



two domains and the domain-wise contributions to antigen 
binding, which would be difficult to examine by experi- 
ments, are expected. 

Exploring the possibility of the prediction of correct 
domain orientations based on energetics, Jacobson and cow- 
orkers examined the variations of Vl/^h domain orientations 
in crystal structures and how they relate to sequence 
(Narayanan et al, 2009). In this study, the difference in 
domain orientation was defined as the non-CDR Ca RMSD 
of the Vl domain when structural superposition is performed 
using the Vh domain. They also calculated molecular 
mechanics energies between the two domains, showing that 
the native Vl/Vh orientations correspond to those of their 
energy minima in conformational space. 

In another study, Abhinandan and Marin defined the 
^l/^h packing angles as a torsion angle at the interface 
using four pseudo-coordinates derived from a set of structur- 
ally conserved residues, and determined the distribution of 
the packing angle, which varies from —31.0 to —60.8° fol- 
lowing approximately a normal distribution with a mean of 
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-45.6° (Abhinandan and Martin, 2010). The authors pro- 
posed a method to predict the Vl/^h packing angles using 
neural networks trained against several interface residues in 
antibody crystal structures. This work also identified some 
potential key residues (L38, L40, L41, L44, L46, L87, H33, 
H42, H45, H60, H62, H91, H105), which are located in the 
framework regions and are extracted by the genetic algo- 
rithm, for the prediction of the packing angle and for anti- 
body engineering (Abhinandan and Martin, 2010). Here it is 
noteworthy that the evaluation of Vj^/Vu domain orientations 
only by the packing angles could neglect another aspect of 
the variations: translation of each domain. 

Tramontane and coworkers also analyzed the variability of 
Vl/Vh domain angles using cluster analysis based on a dis- 
tance by LGA software (Chailyan et ai, 2011b). The number 
of antibody crystal structures with k light chain in the PDB 
is much larger than that of antibodies with X light chain, and 
the authors used a 'balanced light chain set', which included 
similar numbers of Vk and VX antibody sequences, respect- 
ively. They found that VJVh pairings can be divided into 
two major clusters where one cluster is only observed in 
mouse antibodies and the other is observed both in mouse 
and in human, indicating the importance of template selec- 
tion in humanization procedure. They pointed out that these 
clusters could be assigned from the sequences by looking at 
some specific positions (L8, L28, L36, L41, L42, L43, L44, 
L66) in the sequences. Interestingly, only two of these resi- 
dues, L41 and L44, are shared with the residues proposed by 
Abhinandan and Marin as the packing determinants. The 
AbNum scheme (Abhinandan and Martin, 2008) was used in 
both studies for antibody numbering. Structural positions of 
these residues are described in Fig. 4. In another finding of 
Tramontane and coworkers, the antibodies in one of the clus- 
ters tend to recognize smaller antigens, such as haptens. 



A 

Vl domain. 




Fig. 4. Structural positions of residues that were identified as important for 
Martin (2010) and in Chailyan et al. (2011b). Consensus residues proposed 
Abhinandan and Martin (2010) and Chailyan et al. (201 lb) are shown as green 



whereas ones in the other cluster tend to show more promis- 
cuous binding, indicating the correlation between the relative 
orientation and the antigen-binding capability. 

The studies discussed above imply that VJV^i domain 
orientations are encoded by the sequence of each domain, 
but the dependence seems to be subtle. The domain orienta- 
tions can also change somewhat upon antigen binding and 
may fluctuate in solution. Another approach might be to use 
the predicted domain orientation to select the best templates 
for antibody modeling. Also currently unexplored is the cor- 
relation between the packing angle and other properties such 
as solubility and stability. 

Wang et al. (2009) performed covariation analyses using a 
large multiple sequence alignment of Ig-fold derived from 
NR-database at National Center for Biotechnology 
Information as well as from the PDB. Their calculations sug- 
gested the existence of conserved amino acid networks in Vh 
(H37, H38, H44, H45, H47, H103) and (L36, L37, L43, 
L44, L46, L98) domains, respectively, and the networks clus- 
tered in V-JV-n domain interface. They also identified the 
conserved network in Vh/Ch domain interface, but not in VJ 
Cl domain interface. Protein designs based on these con- 
served amino acid networks showed some successes in the 
context of generating thermostabile antibodies. 

Some software for antibody modeling is available on the 
web. Tramontane and coworkers developed the PIGS web 
server, which provides various options for choosing tem- 
plates for the light and heavy chains (Marcatili et al, 2008). 
It constructs CDRs simply by grafting canonical structures 
and CDR-H3 loops of other antibodies onto the modeled 
framework. RosettaAntibody uses a Vl/Vh docking proced- 
ure based on the Rosetta energy function and rigid-body 
minimization to refine the orientations following CDR-II3 
modeling (Sircar et al, 2009; Sivasubramanian et al, 2009). 



B 

Vh domain. 




VJVn packing. AbNum numbering scheme was used both in Abhinandan and 
in both studies are shown as yellow spheres while residues proposed only by 
and red spheres, respectively. (A) Vl domain. (B) Vh domain. 
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Web Antibody Modeling (WAM), which was a pioneering 
software package for antibody modeling developed by Rees 
and coworkers, is also available on the web (Whitelegg and 
Rees, 2000). Recently, Almagro and coworkers performed 
blind tests of antibody modeling methods including PIGS, 
RosettaAntibody and the other two modeling strategies inde- 
pendently developed in Chemical Computer Group and 
Accerlys Inc., using nine unpublished antibody crystal struc- 
tures at that time (Almagro et al, 2011). They demonstrated 
that, overall, the results of each method showed similar 
trends. In each case, the conformations of the framework 
regions were predicted quite well (average RMSDs are 
~ 1 .0 A) whereas they tended to fail to predict the conforma- 
tions of CDR-H3 (average RMSDs > 3.0 A). These results 
reconfirmed the difficulty of modeling CDR-H3, and there 
seems to be much room to improve the methods. More im- 
portantly, this work may lead to additional blind-prediction 
exercises, analogous to CASP (Critical Assessment of protein 
Structure Prediction) and CAPRI (Critical Assessment of 
PRediction of Interactions). 

There are a few public databases that contain antibody 
data from the PDB (Allcorn and Martin, 2002; Kaas et al, 
2004) or from the large sequence database (Retter et al, 
2005; Giudicelli et al, 2006). Considering the rapid increase 
in antibody structural and sequence data, these public data- 
bases will play important roles in organizing the information. 

Antibody-antigen recognition 

Needless to say, the best way to gain atomic-level insights 
into antibody -antigen recognition is to determine crystal 
structures of the complexes by X-ray crystallography. 
However solving the structures of antibody -antigen com- 
plexes is frequently challenging. Computational protein- 
protein docking methods provide an alternative way to study 
antibody -antigen interactions when structures are not avail- 
able. Although computational protein-protein docking 
methods are rapidly improving, there are still limitations, es- 
pecially when modeled structures are used as starting points 
(Gray, 2006). On the other hand, there are a few advantages 
in applying protein-protein docking to antibody -antigen 
complexes. First, we know that antigens bind to the CDRs of 
antibody structures. In addition, the residues in antigens that 
the antibody binds to, called epitopes, can be predicted by 
computational methods to some extent (Evans, 2008). This 
information can help to more accurately predict antigen - 
antibody complex structures, along with any other experi- 
mental data such as from mutagenesis that can be used to 
restrain the protein -protein docking. 

In general, antigenic epitopes in proteins are discontinuous 
in the protein sequences (Barlow et al, 1986; Van 
Regenmortel, 1996). One problem in epitope prediction is 
that many existing methods focus on protein sequences to 
predict continuous epitopes. The increase in the number of 
antibody -antigen complex structures has enabled us to better 
understand the amino acid compositions of antibody -antigen 
interfaces and how they differ from those of other heterodi- 
mer proteins. Structure-based epitope prediction algorithms 
have begun to be developed based on such information 
(Liang et al, 2009). Another problem is that current 
methods, including structure-based algorithms (Haste 
Andersen et al, 2006), are not designed to predict epitopes 



for specific antibodies. Recently Soga et al (2010) have 
developed a method to predict epitope residues for individual 
antibodies from the sequence composition of the antibody- 
antigen interfaces. 

In general, protein -protein interactions can be classified 
into permanent and transient interactions (Ozbabacan et al, 
2011). Antibody -antigen interactions are often considered to 
be transient (Cho et al, 2006). Although we might expect 
high complementarity of antibody -antigen interfaces since 
antibodies often show high specificities and affinities, a pre- 
vious analysis based on a small dataset of available 
structures showed that the shape complementarity of anti- 
body-antigen interfaces was lower on average than that of 
protein-protein interfaces in general (Lawrence and Colman, 
1993). From an evolutionary perspective, antibodies could 
evolve independently of antigens in our body, while other 
proteins involved in protein-protein interactions can evolve 
with their counterparts, which is often called coevolution, so 
that the shape complementarity of general proteins could be 
optimized against their partners. However, a later analysis 
with high-resolution crystal structures suggested the possibil- 
ity that the previous observation that antibody -antigen 
complexes had lower shape complementarity was due to the 
low quality of the crystal structures (Cohen et al, 2005). It 
has been also found that overall topography of antigen- 
binding sites is roughly classified into a limited number 
of groups associated with different types of antigens 
(MacCallum et al, 1996; Lee et al, 2006). For example, 
anti-protein antibodies tend to have flat antigen-binding sites 
while those of anti-peptide antibodies have a groove-like 
shape, in which peptides are buried. Electrostatic comple- 
mentarity of antibody -antigen interfaces seemed to show a 
similar trend to that of protein-protein interface in general 
(McCoy et al, 1997). Since the discussions above were 
based on a limited number of antibody -antigen structures, a 
more comprehensive analysis of the complementarities of the 
interfaces is required to understand the essential nature of 
antibody -antigen recognitions, helping to guide the rational 
design of antibody therapeutics. 

Lysozyme has frequently been used as a model antigen to 
gain insights into antibody -antigen interactions (Davies and 
Cohen, 1996). In computational studies, Smith-Gill and 
coworkers investigated antibody -antigen interactions using 
homology-modeled anti-lysozyme antibody structures, 
HyHELS and HyHEL26, and a crystal structure, HyHELlO 
(Sinha et al, 2002; Mohan et al, 2003). They calculated the 
binding energies and performed kinetics experiments, 
showing that a greater electrostatic contribution to antigen 
binding leads to higher specificity while hydrophobic interac- 
tions favor conformational flexibility and cross-reactivity. 
Their subsequent work based on molecular dynamics (MD) 
simulations with a crystal structure, HyHEL63, also sup- 
ported the importance of electrostatic forces in the anti- 
body-antigen association (Sinha et al, 2007). 

In a recent application of protein -protein docking to 
antibody -antigen complexes. Gray and coworkers used anti- 
body models generated by WAM and antigen structures to 
propose the potential binding modes of antibody -antigen 
complexes using RosettaDock (Sivasubramanian et al, 2006, 
2008). The protein -protein docking used a rigid-body Monte 
Carlo (MC) search and a low-resolution interaction potential 
including an alignment score specific for antibodies in an 
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initial stage, and, in the subsequence refinement stage, an 
all-atom scoring function was used to select a solution (Gray 
et ai, 2003; Gray, 2006). In one of the examples, an 
EGFR-binding antibody, they also incorporated computational 
and experimental mutagenesis information into their docking 
procedures (Sivasubramanian et ai, 2006). 

The flexibility of protein -protein interactions should be 
incorporated when considering the possibility of structural 
change upon binding (Zacharias, 2010). Structural errors in 
comparative models could also lead to serious errors in the 
subsequent steps in protein-protein docking. Methods are 
available for treating side chain flexibility, for instance by 
taking advantage of rotamer libraries, while dealing with 
backbone flexibility is much more difficult. Although impli- 
cit, one approach to incorporate backbone flexibility in 
protein -protein docking is the so-called ensemble docking, 
which uses several structures generated by sampling methods 
as starting points. MD simulations can also be used to gener- 
ate such conformational ensembles. An advantage of ensem- 
ble docking is that it uses rapid rigid protein -protein 
docking methods, which have shown good performance in 
CAPRI contests so far, to identify the correct near-native 
models (Bonvin, 2006). 

Gray and coworkers examined how ensemble docking is 
effective by varying the treatment of backbone flexibilities in 
each ensemble in protein-protein docking simulations in 
general (Chaudhury and Gray, 2008). They took advantage of 
two ensembles for docking: computational ensembles gener- 
ated by RosettaRelax (Misura and Baker, 2005) and nuclear 
magnetic resonance (NMR) ensembles provided by NMR 
solution state studies. Although the backbone flexibilities 
were restricted only to the ligand proteins, they showed 
better performance using ensemble dockings than using a 
rigid-body protein -protein docking. Recently, for the appli- 
cation of modeling antibody -antigen complexes, they also 
developed a novel docking procedure called SnugDock, 
which simultaneously optimizes paratopes, i.e. CDR loops, 
and V]^/Vn domain orientations during antibody -antigen 
docking (Sircar and Gray, 2010). In a subsequent study, this 
method was applied to protein-protein docking in general, 
showing the versatility of this protocol (Sircar et ai, 2010). 
In another innovative study, Simonelli et al. (2010) 
combined experimental data with a computational protein - 
protein docking method. They used several antibody models 
created by the PIGS and RosettaAntibody programs as an 
ensemble, and docked them to the antigen, a surface protein 
of dengue virus, using RosettaDock. They took advantage of 
NMR chemical shift data to validate the docking results, 
showing that NMR epitope mapping improved the accuracy 
of computational docking. 

Looking forward, incorporating backbone flexibility in the 
docking procedure is a promising and an essential approach 
to take into account the structural change upon binding and 
to overcome the small structural errors expected when using 
homology-modeled structures. 

Affinity maturation by somatic mutations 
and computational design 

Antibodies can evolve in a short time in response to antigens, 
so that they are more specific to their antigens and have higher 
affinity, mainly by improving the complementarity of the 



antibody- antigen interfaces (Li et al, 2003; Cauerhff et ai, 
2004; Acierno et al, 2007). Somatic mutations can occur not 
only in the antigen-binding site, but also in the framework 
region, which antigens usually do not contact physically 
(Lavoie et ai, 1992). Computational methods have played an 
important role in understanding how these mutations improve 
binding affinity, and more recently, have also been used to 
predict novel mutations to improve affinity or specificity. 

Based on the analysis of a large number of sequences, 
Clark et al. (2006) examined trends in amino acid substitu- 
tions during the somatic maturation process. Specifically, 
using a gene-fitting procedure with codon probability tables, 
they examined mutation probabilities in 23 116 heavy chains 
and 11 095 light chains. By comparing the probabilities with 
the conventional BLOSUM matrix, they concluded that 
amino acid substitutions in somatic mutation process have 
similar trends to those observed in protein evolution in 
general. They also showed that mutations tend to occur in 
antibody -antigen interfaces and in exposed surface regions, 
rather than the framework region. Finally, they examined 
changes of amino acid composition in the antibody -antigen 
interface, showing that the numbers of tyrosine, serine and 
tryptophan residues decrease while those of histidine, proline 
and phenylalanine increase during the maturation process. 

Several computational studies have been conducted to in- 
vestigate the relationship between somatic mutations and 
conformational flexibilities of antibodies. A plausible hy- 
pothesis is that antibodies acquire rigidity during maturation 
in order to increase binding affinity (by minimizing entropic 
losses upon binding) and to improve specificity. In fact, 
crystal structures of antibody drugs available in the PDB are 
predicted to assume rigid conformations based on their 
sequence features (Kuroda et al, 2008). In these studies, the 
rigidification was hypothesized to occur within the entire 
variable domains rather than within only CDRs, by forming 
regular secondary structures, such as p-turns, and hydrogen 
bonds (Kuroda et al, 2008), or by making residue contact 
networks, during the maturation process (Zimmermann et al, 
2010). Thermodynamic experiments and MD simulations 
demonstrated that all residues in variable domains may par- 
ticipate in the rigidification rather than the limited number of 
CDR residues (Jimenez et al, 2003; Thielges et al, 2008; 
Zimmermann et al, 2010). 

Kollman and coworkers (Chong et al, 1999) used MD 
simulations to study the germline and matured anti-hapten 
antibodies, 48G7 (Patten et al, 1996; Wedemayer et al, 
1997). They showed that maturation of 48G7 was driven by 
electrostatic optimization of the antigen-binding site, which 
resulted in geometric reorganization and rigidification, as 
manifested by smaller root mean square fluctuations of 
matured 48G7 compared with those of the germline anti- 
body. Demirel and Lesk (2005) investigated the same anti- 
bodies using an elastic network model and concluded that 
the fluctuations of the germline antibody were similar to 
those of matured one. Rather, the structural difference 
observed in crystal structures was thought to result from the 
different binding mechanisms of germline and matured anti- 
bodies; induced fit could occur upon hapten binding in the 
germline antibody while hapten binding to the mature one 
was explained by a lock-and-key model. A second well- 
studied pair of germline and mature antibodies are those of 
the anti-hapten antibody, 4-4-20. Experimental results and 
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MD simulations again suggested that the somatic mutations, 
several of which are not in direct contact with the antigen, 
restrict the conformational fluctuations in the mature anti- 
body (Zimmermann et al., 2006; Thorpe and Brooks, 2007). 
More recently, Babor and Kortemme analyzed several germ- 
line and matured antibody structures in the PDB using their 
multi-constraint computational design protocol (Humphris 
and Kortemme, 2007) with the Rosetta scoring function, and 
confirmed that native sequences of germline CDR-H3 are 
designed for conformational flexibility while those of 
matured ones are highly optimized for single conformations 
(Babor and Kortemme, 2009). On the other hand, rigidifica- 
tion upon maturation is most likely not universal, because a 
matured anti-hapten antibody, SPE-7, is known to assume 
several different conformations in response to binding differ- 
ent antigens (James et al., 2003; James and Tawfik, 2003). In 
addition, a series of studies of anti-hapten antibodies has 
demonstrated that rapid maturation based on CDR rigidifica- 
tion tend to arrive at a moderate affinity ceiling whereas flex- 
ibilities in CDR-H3 may play an important role in higher 
affinity maturation (Furukawa et al, 1999, 2001). 

To more directly determine how somatic mutations modu- 
late CDRs flexibility in nature, Jacobson and coworker also 
analyzed a set of antibodies, whose crystal structures were 
solved both in germline and matured forms, without their 
antigens in the crystal structures, using MD simulations with 
an implicit solvent model (Wong et al, 2011). Significant 
rigidification was observed for CDR residues that contact the 
antigen, mainly in CDR-H3, while the other five CDRs fre- 
quently displayed some increase in flexibility. The observed 
rigidification was explained by new hydrogen bonds, salt 
bridges and side chain packings that were improved by 
somatic mutations. Interestingly, conformations similar to the 
germline antigen free- and bound forms of 7G12 (INGZ and 
1N7M, respectively) were reproduced when using the germ- 
line free form (INGZ) as the initial structure in the simula- 
tion, whereas the germline bound form was not sampled 
when the matured free form (INGY) was used as the initial 
structure. This observation suggested a population shift 
mechanism of antigen binding in the 7G12 antibody. 
Consistent with his hypothesis, a structural change of 
CDR-H3 upon antigen binding was observed in the germline 
crystal structures (RMSD of CDR-H3 between the antigen 
bound- and free forms is 2.22 A) while that of matured form 
is not observed (RMSD 0.39 A), and CDR-H3 of the germ- 
line bound form is almost identical to those of matured 
antigen free- and bound forms (RMSD of CDR-H3 in the 
germline form against those in matured antigen free- and 
bound forms is 0.39 and 0.42 A, respectively) (Fig. 5). In 
this antibody, the S97M mutation in the Vh domain contribu- 
ted to improve hydrophobic contacts and modulated the flexi- 
bility of CDR-H3 by anchoring the loop to its antigen free 
from, thus shifting the equilibrium toward the conformation 
most suitable for binding, so that the matured form could 
bind the antigen without entropic penalty. Another mutation 
that modulates flexibility in this antibody was the mutation 
of Ser to Asn at the position 76 in the Vh domain. This 
S76N mutation anchored CDR-Hl loop via hydrogen bonds 
during the simulations. In the simulations of the germline 
form, the Ser residue at the position 76 rarely formed hydro- 
gen bonds with the CDR-Hl whereas, in the simulations of 
the matured form, the Asn at the corresponding position 




Fig. 5. Positions of somatic mutations in a metal chelatase catalytic 
antibody, 7G12. The mutations, R49M, S76N and S97M in Vh domain, are 
represented as spheres. Germline and matured antigen bound forms are 
represented by magenta (1N7M) and yellow (3FCT) model, respectively. 
Germline and matured antigen-free forms are represented by white (INGZ) 
and cyan (INGY) model, respectively. The antigen is shown as a black 
surface model. The residues at position 49 and 97 contact the antigen 
whereas the residue at position 76 is far from the antigen-binding site. 
Structural change of CDR-H3 is observed only in the germline form 
(magenta and white; RMSD 2.2 A. See text.). 



often formed multiple hydrogen bonds via the side chain 
with the CDR-Hl, which results in the rigidification of the 
CDR. It is noteworthy that position 76 in the Vh domain is 
far from the antigen-binding site (Fig. 5). Therefore, this 
example shows how mutations distant from the antigen- 
binding site can modulate binding. 

Although antibodies can evolve in vivo through affinity 
maturation, arguments have emerged that binding affinities 
and specificity could be further improved by using artificial 
systems, such as synthetic antibody libraries (Soderlind 
et al., 2000; Borrebaeck and Ohlin, 2002; Bond et al., 2005; 
Sidhu and Fellouse, 2006; Michnick and Sidhu, 2008; Babor 
et al, 2011) and computational protein designs (Babor and 
Kortemme, 2009). Here we review some recent successes of 
the latter approach, in which the affinities of natural anti- 
bodies were improved to varying extents (Table I). Details of 
protein design methodologies are reviewed elsewhere 
(Rosenberg and Goldblum, 2006; Lippow and Tidor, 2007; 
Karanicolas and Kuhlman, 2009). 

Most of the current examples of computational antibody 
design start from antigen -antibody complex structures in the 
PDB, that is, the studies can be described as 're-design' of 
antigen -antibody interfaces. Protein interface re-design 
relies on estimating free energy changes due to amino acid 
substitutions, which can be accomplished using physics- 
based force fields (Huang et al., 2006) or knowledge-based 
potentials derived from the structural database (Ota et al, 
2001; Russ and Ranganathan, 2002; Clark and van Vlijmen, 
2008). Similar to comparative modeling methods, re-design 
also requires efficient conformational search algorithms, such 
as dead-end elimination (DEE) (Desmet et al, 1992; Dahiyat 
and Mayo, 1996, 1997; De Maeyer et al, 2000) and MC 
searches (Koehl and Levitt, 1999; Kuhlman and Baker, 
2000), to sample conformations of modified structures. In 
most studies, protein backbones have been fixed during the 
procedures due to computational limitations. However, it has 
become possible to incorporate explicit backbone flexibility 
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Table I. Results of computational designs 



Initial structures'" Resolution Antigen K^^'' A:"'"''' Affinity Designed mutations'' Ref 

(A) improvement" 



AQC2 (IMHP) 


2.80 


Integrin 


7nM 


850 pM 


10 


L: S28Q, N52E, H: T50V, K64E 


Clark et al. (2006b) 


D44.1 (IMLC) 


2.50 


Lysozyme 


4.4 nM 


30 pM 


140 


L: N92A, H: T28D, E35S, S57V, T58D, 


Lippow et al. 














G99D 


(2007) 


TA4 


N/A' 


Gastrin 


6 mM 


13.2 nM 


454 


L: H91F, Q92F, R94P, V96A, H; IIOOL, 


Barderas et al. 














S102V 


(2008) 


E2 (3BN9) 


2.17 


Protease 


4.8 nM 


340 pM 


14 


H: T98R 


Farady et al. (2009) 



''Antibody structures to be designed. PDB codes are shown in parentheses. 
''Reported values in the papers. For E2, Ki value is shown. 
'^Reported values in the papers. 

''There are a number of designed mutations in each works. Only representatives are shown. L and H represent light and heavy chains, respectively. 
"Homology modeled structure was used. 



in computational protein design (Hu et al., 2007; Murphy 
et al, 2009). 

Clark et al. optimized an antibody AQC2 for binding to 
the I domain of human integrin VLAl using the CHARMM 
force field (Brooks et al, 1983), the ICE software (Kangas 
and Tidor, 1998), which calculates electrostatic interactions 
and desolvation energies, and a DEE-based side chain 
repacking software, DEZYMER, developed by Hellinga and 
coworkers (Looger and Hellinga, 2001). This approach suc- 
ceeded in improving the binding affinity even though the ori- 
ginal antibody has the relatively high affinity (/("o ~ 7 nM) 
(Clark et al, 2006a). They tried >80 designed variants in- 
cluding multiple mutations, in which the combination of four 
mutations (L: S28Q, N52E, H: T50V, K64E; Fig. 6A) dis- 
played the highest affinity of 850 pM, which was a 10-fold 
improvement over the wild type. The designed structure was 
solved and the rearrangement of a hydrogen bond network in 
the interface was elucidated. 

In another successful antibody -antigen re-design, Lippow 
et al. (2007) focused on electrostatic interactions. One of the 
obvious advantages to optimize electrostatic forces is that 
long-range interactions could be incorporated into design 
process. They designed single mutations in all CDR positions 
using the DEE and A* search algorithms with the 
CHARMM force field, ranked them by Poisson-Boltzmann 
electrostatics and demonstrated that four mutations (L: 
N92A, H: T28D, S57V, T58D; Fig. 6B) together generate 
100-fold improvement of the affinity over wild type (43 pM 
vs. 4.4 nM for wild type). 

A somewhat different use of computational modeling is to 
identify favorable positions for experimental random mutagen- 
esis, rather than relying on computational prediction to iden- 
tify specific favorable mutations. In such a study, Barderas 
et al. (2008) used computational modeling, including con- 
strained antigen docking and MD simulations, to select favor- 
able residues for random mutagenesis in CDR-H3 and -L3 by 
phage display. This approach resulted in mature anti-gastrin 
antibodies with binding affinities as potent as 13.2 nM, a 
454-fold improvement over wild type based on six mutations. 

In some cases, specificity, and not affinity, is the major 
challenge addressed by computational design, as in the work 
by Farady et al. (2009), which aimed to improve the species 
cross-reactivity of an anti-protease antibody, E2. The 
problem is that antibodies derived from mice immunized by 
human antigens often display low efficacy in pre-clinical 
animal models because the structures and sequences of the 



antigen targets are not completely conserved between 
humans and mice. In the case of the protease antigen, called 
MT-SPl in human and epithin in mouse, they share 87% 
sequence identity, and three surface residues that contact 
the antibody differ. Based on the complex structure of 
MT-SP1/E2, an epithin/E2 structure was modeled. Residues 
in the antibody that contacted the antigen were selected for 
in silico mutations and several mutations were predicted to 
improve the affinity for epithin on the basis of a molecular 
mechanics energy function. The T98R mutation (Fig. 6C) in 
CDR-H3 showed the largest predicted free energy change, 
and subsequent experimental tests confirmed that the muta- 
tion results in a 14-fold affinity improvement over wild type 
(^Ti reduced from 4.8 nM to 340 pM). The mutation is 
believed to improve epithin binding by introducing new 
hydrogen bonds in the antigen -antibody interface. 

What can we learn from these successes? First, high- 
resolution crystal structures are not necessarily required for 
computational design. As shown in Table I, initial structures, 
which have moderate resolution from 2.17 to 2.80 A, could 
be exploited for computational design, as well as homology 
models of antibodies and antigens. These successes imply 
that current antibody modeling methods might be sufficiently 
accurate to be used in computational antibody design. 
Second, computational protein -protein docking could be 
used as a first step in design procedure, particularly in cases 
where experimental information can be used to help to guide 
the docking. Third, significant improvement in binding affin- 
ity frequently requires multiple amino acid substitutions. 
Finally, as seen in Figs 5 and 6, it is clear that not only 
CDR-H3 but also the other five CDR loops can be candidates 
for affinity-enhancing mutations. Conformational changes 
that occur upon antigen binding, especially in CDR-H3, 
present a challenge for antibody design. In addition, current 
protocols mainly focus on optimizing antigen-contacting 
residues in CDRs. Mutations during somatic maturations, 
however, often occur in the framework regions, which anti- 
gens cannot directly contact (Lavoie et al, 1992). Exploiting 
mutations of framework regions might be possible when 
incorporating backbone flexibility during computational 
designs. 

Stability design and aggregation in antibodies 

Another challenging application of computational designs in 
antibody engineering is to predict aggregate-prone regions 
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Fig. 6. Positions of designed mutations in antibodies. CDR-H3 is colored 
blue. Positions of designed mutations are represented as sphere models. 
Original residues are shown. (A) Anti-integrin antibody, AQC2 (IMHP). 
(B) Anti-lysozyme antibody, D44.1 (IMLC). (C) Anti-protease antibody, 
E2 (3BN9). 



(APRs) from amino acid sequences and to design 
aggregate-resistant antibodies by introducing mutations in 
those regions. One mechanism of aggregation is the forma- 
tion of amyloid fibrils, which are rich in (3-sheet structure, 
and have been associated with diseases such as Alzheimer's 
or Parkinson's diseases (Dobson, 1999). Experimental obser- 
vations suggest that short segments of a protein could facili- 
tate aggregation (Ventura et al., 2004), and these are termed 
as APRs. Currently, several computational methods are 
available to predict APRs and rates of aggregations, mainly 
based on the sequence composition and on propensities 
such as hydrophobicity, charge and secondary structure pro- 
pensity (Caflisch, 2006; Conchillo-Sole et ah, 2007; Trovato 



et al., 2007; Garbuzynskiy et al, 2010). Empirical or phe- 
nomenological models have been proposed for the effect of 
mutations on the aggregation rate on the basis of the ex- 
perimental data (Chiti et al., 2003; DuBay et al, 2004). 
MD simulations have also been used to interpret and com- 
plement experimental studies of protein aggregation and 
amyloidosis (Ma and Nussinov, 2006; Sharma et al, 2008). 
A review of aggregation in protein therapeutics is available 
elsewhere (Agrawal et al., 2011), so here we limit our- 
selves to recent studies on design of aggregation-resistant 
antibodies. 

Aggregation in therapeutic proteins can lead to problems 
with immunogenicity (Rosenberg, 2006; Kumar et al., 2011) 
and in addition, antibodies have been known to aggregate in 
high concentration of formulations for therapy and in storage 
(Shire et al, 2004). Many experimental studies have been 
performed to investigate antibody stability and resistance to 
aggregation, primarily using single-chain Fv fragments 
(scFv) (Bird et al., 1988; Glockshuber et al., 1990; Worn and 
Pluckthun, 1998, 1999, 2001). 

Designed mutations for aggregation-resistant antibodies 
may induce the loss of desired affinities. Wang et al. exam- 
ined how APRs in antibody structures contribute to antigen 
recognition using TANGO (Fernandez-Escamilla et al., 
2004) and PAGE (Tartaglia et al, 2005), which are computa- 
tional tools to predict APRs in protein sequences, based on 
29 antigen -antibody complexes in the PDB (Wang et al, 
2010). They found that the APRs most frequently appeared 
in CDR-H2, and less frequently in CDR-H3, although resi- 
dues in the N-terminal framework region of CDR-H3 are 
predicted to be APRs. Based on changes of buried surface 
area upon antigen binding, they also showed that aromatic 
residues, Tyr and Trp, are favored both in antigen-binding 
sites and APRs, indicating that aggregation may be asso- 
ciated with antigen binding via these residues. 

In a different, structure-based approach to predicting 
APRs in antibodies. Trout and coworkers proposed a novel 
measure called spatial aggregation propensity (SAP), which 
quantifies the exposure of hydrophobic residues, averaged 
over snapshots from a MD simulations with explicit water 
(Chennamsetty et al, 2009a,b; Voynov et al, 2009). Using 
this measure, they identified 14 aggregation-prone motifs in 
constant regions of human IgG molecules and, by comparing 
the amino acid sequences, they noticed that those APRs are 
not conserved among the other antibody classes (IgAl, IgD, 
IgE and IgM) (Chennamsetty et al, 2009a). In these works, 
antibody structures were used to derive SAP values but anti- 
body models could potentially be used when crystal struc- 
tures are unavailable. Subsequent work from the same lab 
demonstrated that the SAP measure could be applied to anti- 
body fragments, such as Fab or Fc, and homology-modeled 
structures, and that MD simulations could be performed with 
implicit solvent models to obtain SAP values with reasonable 
computational time and tolerable accuracies (Chennamsetty 
et al, 2010). 

In an experimental study, Perchiacca et al. (2011) per- 
formed mutational analysis of a human Vh domain antibody 
to investigate the molecular origin of the aggregation by 
comparing the biophysical properties of an aggregation-prone 
antibody and the aggregation-resistant counterpart. The dif- 
ference between the two antibodies is only in the sequence 
composition of each CDR. Although the introduction of 
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charged mutations distant from the CDRs failed to confer 
aggregation-resistance property for the aggregation-prone 
antibody, the mutations within the first CDR loop (H31- 
H33) or near the CDR (H29) successfully produced an 
aggregation-resistant antibody. It is noteworthy that, while 
the results of some computational methods that were used to 
predict APRs in this antibody contain several false positives, 
the consensus APR of those prediction tools (H28-H32) is 
almost identical to the aggregation hot spots in this experi- 
ment (H29, H31-H33). This indicates that computational 
meta-approaches that integrate prediction results from several 
methods may help to improve prediction of APRs in 
antibodies. 

Wang and Duan (2011) performed MD simulations with 
the aim of proposing a strategy for improving the thermal 
stability of an anti-VEGF scFv antibody. The simulations 
demonstrated that, at least in the scFv antibody, when the 
temperature is elevated, the dissociation of Vl/^h domain 
interface occurred at first followed by unfolding of Vl 
domain and then partial unfolding of Vu domain, suggesting 
that stability of the Vl/^h domain interface may govern the 
stability of entire antibody structure. If this is correct, then 
engineering of the Vj^/Vu domain interface is the first choice 
for improving the thermal stability of the antibody. The ex- 
tensive analyses of Vi^/Vu domain interfaces reviewed in the 
previous section may provide guidance for producing more 
stable antibodies based on Vl/^h interface designs in silico. 

Antigen design to elicit NAbs 

Advances in computational protein design methods, and the 
increasing number of protein crystal structures, may make it 
possible to develop vaccine candidates more rationally and 
efficiently. Vaccine injections elicit desired NAbs for anti- 
gens or viruses targeted in therapy (Burton, 2002). HIV-1 
has attracted much attention because of the lack of appropri- 
ate vaccines and because of the increasing number of 
patients. NAbs work either by binding to the virus surface 
and then blocking subsequent cellular response, or by 
binding after virion attachment to a target cell so that they 
can inhibit viral entry. HIV-1 shows significant variability of 
the sequences and glycosylation patterns, which results in its 
escaping from human immune systems (Burton et al., 2004). 
A few regions in the HIV-1 envelope spike regions are struc- 
turally conserved among diverse isolates, including the 
CD4-binding site on gpl20, the coreceptor binding site on 
gpl20 and the membrane proximal external region (MPER) 
of gp41, all of which can be targets for binding by antibodies. 
Thus, eliciting broadly NAbs by those conserved epitopes or 
by their mimics is challenging, and advanced computational 
approaches could conceivably help tackle this issue. 

Ofek et al. (2010a,b) used computational-guided approach 
to design epitope scaffolds to elicit anti-HIV-1 antibodies. 
They introduced a known linear epitope of MPER, which is 
usually recognized by 2F5 antibody, into scaffolds selected 
from the PDB. The existing scaffolds were re-designed by 
RosettaDesign suit to introduce mutations so that the targeted 
epitope is accommodated in the scaffolds, leading to stable 
proteins that consist of the HIV-1 epitope and the scaffolds 
derived from unrelated proteins. These designs succeeded 
in eliciting some antibodies with similar binding modes of 
2F5 with the peptide epitope but with shorter CDR-H3 loops 



than 2F5. Despite this apparent success, the resulting anti- 
bodies did not neutralize the virus in an experimental test 
probably due to the lack of a long CDR-H3 loop with the ap- 
propriate hydrophobic nature. However, crystal structures 
demonstrated that the elicited antibodies induced the desired 
conformational change of the HIV epitope, which is encour- 
aging for computational immunogen design based on epitope 
scaffolds already existing in the PDB. They also suggested a 
correlation between epitope flexibility and immunogenicity 
based on the structures obtained. 

In a related study, Lapelosa et al. (2009, 2010) analyzed 
the MPER epitope with human rhinovirus (HRV) scaffolds 
using replica exchange molecular dynamics (REMD) simula- 
tion techniques with an implicit solvent model. Their inten- 
tion was to identify the conformational preferences of the 
epitope, which are considered to be recognized by 2F5 anti- 
body, and to identify the virus scaffold that maximizes the 
epitope exposure to solvent. They proposed a structural 
model of the induced antibody and HRV complex. This 
model was then used for further design of virus constructs. 
Using a focused library, the anchor of each end of the 2F5 
epitope in the modeled HRV sequence was optimized, and 
the binding affinity of the optimized models was evaluated 
experimentally. Subsequently, the authors used REMD simu- 
lations of the optimized models to show a correlation 
between the binding affinity and the conformational prefer- 
ences of the epitope, showing that conformations that closely 
mimicked ones in crystal structures displayed high affinity, 
confirming the validity of their proposed computational 
model and the strategy used. 

Another NAb that recognizes the MPER of gp41 is the 
4E10 antibody. Correia et al. (2010b) designed an epitope 
scaffold based on the binding mode of this antibody, again 
by using Rosetta. They searched for appropriate backbone 
scaffolds from the PDB, and designed the amino acid 
sequences with the goal of designing scaffolds that are 
soluble and stable with the epitope peptide in solution. Their 
subsequent experimental validation showed that the designed 
proteins consisting of the MPER epitope and the scaffolds 
bind to 4E10 with higher affinities (from pM to nM levels) 
compared with only the MPER peptide epitope itself (nM 
level). Then, they determined the crystal structures, demon- 
strating the similarity between the inserted epitope in the 
scaffold and the epitope recognized by 4E10 in the anti- 
body-peptide crystal structures. Based on these designed 
chimeric proteins, they also proposed two distinct computa- 
tional technologies: flexible backbone remodeling and resur- 
facing (Correia et al., 2010a). These methods were used to 
modify the structure of the designed proteins with the goal 
of the minimizing immunogenicity while retaining the 
thermal stability, solubility and high affinity toward cognate 
antibody obtained in the earlier study (Correia et al., 2010b). 
By using the backbone remodeling method, a globular 
domain, which assumes a-p-a topology, of the designed 
protein scaffold was trimmed, then the sequence of the 
remaining scaffold was designed by the resurfacing method 
or side chain replacement, resulting in a chimeric protein 
having a better melting temperature and a similar affinity to 
the parent protein. They also determined the crystal structure 
of the re-designed proteins, showing close agreement 
between the structure and the model. However they also 
pointed out that the loop region that was replaced from a 
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domain in the parent protein in the remodeling stage had a 
slightly different backbone conformation from the crystal 
structure, which results in packing errors in the region, 
demonstrating that even a small displacement of a backbone 
conformation could lead to an inaccurate prediction. 

In a related study, Oomen et al. (2003) designed an immu- 
nogen by synthesizing cyclic peptides, which reduce the 
peptide flexibility and mimic the |3-turn conformation in the 
peptide -antibody complex structure, as an alternative to 
using an epitope scaffold to elicit an antibody. They assessed 
conformational preferences and the stability of four designed 
peptides using MD simulations. The most stable and the 
least flexible peptide evaluated by the simulations actually 
induced a cross-reactive potent antibody in an experiment. 

In the studies discussed here, the common strategy for 
antigen designs is to produce epitope mimics of the NAbs 
identified from crystal structures, and a challenge is maxi- 
mizing the exposure of the mimics, which is supposed to be 
recognized by an antibody, to solvent in a certain protein en- 
vironment. Advanced computational techniques have been 
utilized in this context. Currently, a huge number of protein 
structures are available in public, and the number is still 
growing due to the structural genomics efforts. Hence search- 
ing for a protein scaffold from the structural database seems 
to be a promising approach for rational vaccine designs. De 
novo scaffolds for protein design were also proposed 
(MacDonald et al., 2010). In addition, MD simulations are 
required to analyze conformational preferences of epitopes in 
solution, ideally in conjunction with experiments such as 
NMR. 

Looking forward, we envision that computational methods 
may be particularly useful to study antibody -peptide com- 
plexes, that is, the recognition of sequential epitopes. The 
peptides often assume (3-turn conformations in an antigen- 
binding site, and the recognition mechanism seems to 
depend on the length of CDR-H3 (Bates et al., 1998). 
Considering their smaller size compared with protein anti- 
gens, the evaluation of free energy of binding for designed 
peptides by computer simulations is a promising approach 
for rational vaccine design. Furthermore, information con- 
cerning the behavior of mutant viral antigens may be quite 
useful for rational vaccine design since pandemic could 
occur because of antigenic drifts. Structures and simulations 
of antibody -antigen complexes provide essential information 
to analyze such behavior. 

Perspectives and concluding remarl<s 

Recent advances in computational technologies combined 
with dramatic advances in high-throughput technology 
(Reddy and Georgiou, 2011) have the potential to make im- 
portant contributions to the development of biological thera- 
peutics, including those based on antibodies and antigens. A 
central challenge is accurately predicting antibody structures 
from their sequences, and significant progress has been made 
toward this goal. The prediction of antibody -antigen-binding 
modes, by computational protein -protein docking, has also 
made great progress, especially by exploiting knowledge of 
the antigen-binding site and experimental data. Accurate pre- 
diction of epitopes and paratopes should further improve 
accuracy. Finally, we have reviewed several examples of 
computational affinity maturation (Clark et al, 2006a; 



Lippow et al, 2007; Farady et al, 2009), all of which have 
taken advantage of crystal structures of antibody -antigen 
complexes in the PDB except for Barderas et al. (2008). 

However, there are still no striking examples where all of 
these computational capabilities have been combined, i.e. 
both an antibody and an antigen are modeled from their 
sequences, their binding mode is predicted by protein-protein 
docking, and then the resultant model is used to predict 
mutations that improve binding affinity and/or specificity. In 
such an ambitious undertaking, relatively small errors in 
each step could be amplified in subsequent steps. However, 
we anticipate continuing improvements in all of these com- 
putational approaches; for example, the recent emergence of 
methods for incorporating backbone flexibility in protein 
docking and protein engineering will be potentially useful in 
computational antibody design. Treating loop flexibility is 
particularly important in antibody design because large con- 
formational changes can occur upon antigen binding, and 
affinity maturation can modulate both the conformations and 
dynamics of the CDR loops. 

Beyond improvements in conformational sampling 
methods, continuing improvements in 'scoring functions' 
used to estimate free energies are needed. Approximations in 
the scoring function can lead to deleterious effects, especial- 
ly when using implicit solvent models, and quantitative esti- 
mations of free energy changes upon antigen binding are still 
very difficult, even in the case of small ligands. Most current 
force fields for protein design include many heuristic terms. 
Although qualitative estimations might be sufficient for some 
applications, rigorous free energy evaluations explicitly in- 
cluding the contributions of conformational entropy and 
solvent polarization are challenging issues in computational 
chemistry and biology. Knowledge-based potentials will 
likely also continue to improve as the number of protein 
structures and sequences continues to increase exponentially. 

In terms of improvements of the properties in antibodies, 
such as immunogenicity and solubility, sequence-based 
methods have shown promise in antibody engineering 
(Abhinandan and Martin, 2007; David et al, 2010; Thullier 
et al, 2010), in addition to structure-based methods. The 
sequence-based methods can take advantage of the rapid 
advances in sequencing technology. 

Structure-based vaccine design approaches will also be 
enhanced with increasing knowledge of antibody -antigen 
structures as well as improved computational methods. 
Scaffold-based vaccine design is a promising approach. 
Although high specificity is desired for an antibody drug, the 
importance of poly-reactivity of antibodies as a vaccine can- 
didate has been pointed out (Dimitrov et al, 2011) because 
such antibodies might accommodate high variability of 
single antigens. Thus, understanding the origins of high spe- 
cificity and cross-reactivity of antibodies is important in 
terms of developing protein therapeutics, and further struc- 
tural analyses of antibody -antigen complex structures should 
provide hints to elucidate such mechanisms. 

Overall, although there has been much progress in under- 
standing sequence -structure relationships in antibodies, there 
is still much to learn, especially regarding the mechanisms 
of antibody affinity maturation, such as the relationship 
between somatic mutations and conformational heterogen- 
eity. Other types of protein scaffolds have been explored as 
an alternative to antibodies, and computational methods 
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are likely to play an important role in these undertakings. 
Another area that requires greater attention is the role of 
water molecules in antibody -antigen interfaces (Yokota 
et al, 2003). Water molecules can improve the complimen- 
tary of the interface as seen in the case of affinity matura- 
tions by somatic mutations. Explicit treatment of water 
molecules in the design process will require novel theories or 
technologies, and some such efforts are underway (Jiang 
et al, 2005; Schymkowitz et al, 2005). Finally, it will be 
challenging to simultaneously optimize binding affinity, 
specificity, stability, solubility and other properties needed 
for a successful therapeutic protein. 
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